AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Acevedo, Santiago, Mascaretti, Andrea, Rende, Riccardo, Mahaut, Matéo, Baroni, Marco, Laio, Alessandro

A quantitative analysis of semantic information in deep representations of text and images

arXiv.org Artificial IntelligenceDec-8-2025

Deep neural networks are known to develop similar representations for semantically related data, even when they belong to different domains, such as an image and its description, or the same text in different languages. We present a method for quantitatively investigating this phenomenon by measuring the relative information content of the representations of semantically related data and probing how it is encoded into multiple tokens of large language models (LLMs) and vision transformers. Looking first at how LLMs process pairs of translated sentences, we identify inner ``semantic'' layers containing the most language-transferable information. We find moreover that, on these layers, a larger LLM (DeepSeek-V3) extracts significantly more general information than a smaller one (Llama3.1-8B). Semantic information of English text is spread across many tokens and it is characterized by long-distance correlations between tokens and by a causal left-to-right (i.e., past-future) asymmetry. We also identify layers encoding semantic information within visual transformers. We show that caption representations in the semantic layers of LLMs predict visual representations of the corresponding images. We observe significant and model-dependent information asymmetries between image and text representations.

large language model, machine learning, natural language, (20 more...)

2505.17101

Country:

Asia (0.93)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 15:33:50 GMT

Disentangling factors of variation in deep representation using adversarial training

We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g.

adversarial training, disentangling factor, variation, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Maharjan, Rahul Singh, Romeo, Marta, Cangelosi, Angelo

Attributes-aware Visual Emotion Representation Learning

arXiv.org Artificial IntelligenceApr-10-2025

Visual emotion analysis or recognition has gained considerable attention due to the growing interest in understanding how images can convey rich semantics and evoke emotions in human perception. However, visual emotion analysis poses distinctive challenges compared to traditional vision tasks, especially due to the intricate relationship between general visual features and the different affective states they evoke, known as the affective gap. Researchers have used deep representation learning methods to address this challenge of extracting generalized features from entire images. However, most existing methods overlook the importance of specific emotional attributes such as brightness, colorfulness, scene understanding, and facial expressions. Through this paper, we introduce A4Net, a deep representation network to bridge the affective gap by leveraging four key attributes: brightness (Attribute 1), colorfulness (Attribute 2), scene context (Attribute 3), and facial expressions (Attribute 4). By fusing and jointly training all aspects of attribute recognition and visual emotion analysis, A4Net aims to provide a better insight into emotional content in images. Experimental results show the effectiveness of A4Net, showcasing competitive performance compared to state-of-the-art methods across diverse visual emotion datasets. Furthermore, visualizations of activation maps generated by A4Net offer insights into its ability to generalize across different visual emotion datasets.

machine learning, natural language, recognition, (17 more...)

2504.06578

Country:

Europe > United Kingdom (0.46)
Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.78)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
(2 more...)

Neural Information Processing SystemsFeb-11-2025, 20:34:48 GMT

Disentangling factors of variation in deep representation using adversarial training

adversarial training, representation, variation, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Neural Information Processing SystemsJan-21-2025, 16:24:26 GMT

Review for NeurIPS paper: A Bayesian Nonparametrics View into Deep Representations

The model is used to investigate the complexity of representations through the KL divergence between the max entropy reference and the model posterior. The reviewers generally felt the paper made a variety of interesting observations. In a final version, the authors are encouraged to read and account for updates to reviewer comments after the rebuttal, and to discuss https://arxiv.org/abs/2002.08791, which provides a complementary Bayesian nonparametric perspective on deep neural networks.

bayesian nonparametric view, deep representation, neurips paper

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Bui, Ha Manh, Mallada, Enrique, Liu, Anqi

Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits

arXiv.org Machine LearningNov-8-2024

By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose Neural-$\sigma^2$-LinearUCB, a variance-aware algorithm that utilizes $\sigma^2_t$, i.e., an upper bound of the reward noise variance at round $t$, to enhance the uncertainty quantification quality of the UCB, resulting in a regret performance improvement. We provide an oracle version for our algorithm characterized by an oracle variance upper bound $\sigma^2_t$ and a practical version with a novel estimation for this variance bound. Theoretically, we provide rigorous regret analysis for both versions and prove that our oracle algorithm achieves a better regret guarantee than other neural-UCB algorithms in the neural contextual bandits setting. Empirically, our practical method enjoys a similar computational efficiency, while outperforming state-of-the-art techniques by having a better calibration and lower regret across multiple standard settings, including on the synthetic, UCI, MNIST, and CIFAR-10 datasets.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

2411.05979

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsOct-9-2024, 13:09:06 GMT

A Bayesian Nonparametrics View into Deep Representations

bayesian nonparametric view, deep representation, representation, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceFeb-4-2024

EuLagNet: Eulerian Fluid Prediction with Lagrangian Dynamics

Ma, Qilong, Wu, Haixu, Xing, Lanxiang, Wang, Jianmin, Long, Mingsheng

Accurately predicting the future fluid is important to extensive areas, such as meteorology, oceanology and aerodynamics. However, since the fluid is usually observed from an Eulerian perspective, its active and intricate dynamics are seriously obscured and confounded in static grids, bringing horny challenges to the prediction. This paper introduces a new Lagrangian-guided paradigm to tackle the tanglesome fluid dynamics. Instead of solely predicting the future based on Eulerian observations, we propose the Eulerian-Lagrangian Dual Recurrent Network (EuLagNet), which captures multiscale fluid dynamics by tracking movements of adaptively sampled key particles on multiple scales and integrating dynamics information over time. Concretely, a EuLag Block is presented to communicate the learned Eulerian and Lagrangian features at each moment and scale, where the motion of tracked particles is inferred from Eulerian observations and their accumulated dynamics information is incorporated into Eulerian fields to guide future prediction. Tracking key particles not only provides a clear and interpretable clue for fluid dynamics but also makes our model free from modeling complex correlations among massive grids for better efficiency. Experimentally, EuLagNet excels in three challenging fluid prediction tasks, covering both 2D and 3D, simulated and real-world fluids.

eulagnet, eulerian fluid prediction, particle, (11 more...)

2402.02425

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hou, Jinyong, Deng, Jeremiah D., Cranefield, Stephen, Din, Xuejie

Variational Transfer Learning using Cross-Domain Latent Modulation

arXiv.org Artificial IntelligenceJan-31-2024

To successfully apply trained neural network models to new domains, powerful transfer learning solutions are essential. We propose to introduce a novel cross-domain latent modulation mechanism to a variational autoencoder framework so as to achieve effective transfer learning. Our key idea is to procure deep representations from one data domain and use it to influence the reparameterization of the latent variable of another domain. Specifically, deep representations of the source and target domains are first extracted by a unified inference model and aligned by employing gradient reversal. The learned deep representations are then cross-modulated to the latent encoding of the alternative domain, where consistency constraints are also applied. In the empirical validation that includes a number of transfer learning benchmark tasks for unsupervised domain adaptation and image-to-image translation, our model demonstrates competitive performance, which is also supported by evidence obtained from visualization.

domain adaptation, latent space, representation, (14 more...)

2205.15523

Country:

Oceania > New Zealand (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)